Low overhead speculative selection of hot traces in a caching dynamic translator

ABSTRACT

A method and apparatus for selecting hot traces for translation and/or optimization is described in the context of a caching dynamic translator. The code cache stores hot traces. Profiling is done at locations that satisfy a start-of-trace condition, e.g., the targets of backward taken branches. A hot target of a backward taken branch is speculatively identified as the beginning of a hot trace, without the need to profile the blocks that make up the trace. The extent of the speculatively selected hot trace is determined by an end-of-trace condition, such as a backward taken branch or a number of interpreted or native instructions. The interpreter is augmented with a mode in which it emits native instructions that are cached. A trace is cached by identifying a hot start of a trace and then continuing interpretation while storing the emitted native instruction stream until an end-of-trace condition is met.

FIELD OF THE INVENTION

[0001] The present invention relates to techniques for identifyingportions of computer programs that are particularly frequently executed.The present invention is particularly useful in dynamic translatorsneeding to identify candidate portions of code for caching and/oroptimization.

BACKGROUND

[0002] Dynamic translators translate one sequence of instructions intoanother sequence of instructions which is executed. The second sequenceof instructions are ‘native’ instructions—they can be executed directlyby the machine on which the translator is running (this ‘machine’ may behardware or this machine may be defined by software that is running onyet another machine with its own architecture). A dynamic translator canbe designed to execute instructions for one machine architecture (i.e.,one instruction set) on a machine of a different architecture (i.e.,with a different instruction set). Alternatively, a dynamic translatorcan take instructions that are native to the machine on which thedynamic translator is running and operate on that instruction stream toproduce an optimized instruction stream. Also, a dynamic translator caninclude both of these functions (translation from one architecture toanother, and optimization).

[0003] Caching dynamic translators attempt to identify program hot spots(frequently executed portions of the program, such as certain loops) atruntime and use a code cache to store translations of those frequentlyexecuted portions. Subsequent execution of those portions can use thecached translations, thereby reducing the overhead of executing thoseportions of the program.

[0004] A dynamic translator may take instructions in one instruction setand produce instructions in a different instruction set. Or, a dynamictranslator may perform optimization: producing instructions in the sameinstruction set as the original instruction stream; thus, dynamicoptimization is a special native-to-native case of dynamic translation.Or, a dynamic translator may do both—converting between instruction setsas well as performing optimization.

[0005] In general, the more sophisticated the execution profilingscheme, the more precise the hot spot identification can be, and hence(i) the smaller the translated code cache space required to hold themore compact set of identified hot spots of the working set of therunning program, and (ii) the less time spent translating hot spots intonative code (or into optimized native code). Unless special hardwaresupport for profiling is provided, it is generally the case that a morecomplex profiling scheme will incur a greater overhead. Thus, dynamictranslators typically have to strike a balance between minimizingoverhead on the one hand and selecting hot spots very carefully on theother.

[0006] Depending on the profiling technique used, the granularity of theselected hot spots can vary. For example, a fine-grained technique mayidentify single blocks (a straight-line sequence of code without anyintervening branches), whereas a more coarse approach to profiling mayidentify entire procedures. Since there are typically many more blocksthat are executed compared to procedures, the latter requires much lessprofiling overhead (both memory space for the execution frequencycounters and the time spent updating those counters) than the former. Insystems that are doing program optimization, another factor to consideris the likelihood of useful optimization and/or the degree ofoptimization opportunity that is available in the selected hot spot. Ablock presents a much smaller optimization scope than a procedure (andthus fewer types of optimization techniques can be applied), although ablock is easier to optimize because it lacks any control flow (branchesand joins).

[0007] Traces offer yet a different set of tradeoffs. Traces (also knownas paths) are single-entry multi-exit dynamic sequences of blocks.Although traces often have an optimization scope between that for blocksand that for procedures, traces may pass through several procedurebodies, and may even contain entire procedure bodies. Traces offer afairly large optimization scope while still having simple control flow,which makes optimizing them much easier than a procedure. Simple controlflow also allows a fast optimizer implementation. A dynamic trace caneven go past several procedure calls and returns, including dynamicallylinked libraries (DLLs). This allows an optimizer to perform inlining,which is an optimization that removes redundant call and returnbranches, which can improve performance substantially.

[0008] Unfortunately, without hardware support, the overhead required toprofile hot traces using existing methods (such as described by T. Balland J. Larus in “Efficient Path Profiling”, Proceedings of the 29thSymposium on Micro Architecture (MICRO-29), December 1996) is oftenprohibitively high. Such methods require instrumenting the programbinary (inserting instructions to support profiling), which makes theprofiling non-transparent and can result in binary code bloat. Also,execution of the inserted instrumentation instructions slows downoverall program execution and once the instrumentation has beeninserted, it is difficult to remove at runtime. In addition, such methodrequires sufficiently complex analysis of the counter values to uncoverthe hot paths in the program that such method is difficult to useeffectively on-the-fly while the program is executing. All of these maketraditional schemes inefficient for use in a caching dynamic translator.

[0009] Hot traces can also be constructed indirectly, using branch orbasic block profiling (as contrasted with trace profiling, where theprofile directly provides trace information). In this scheme, a counteris associated with the taken target of every branch (there are othervariations on this, but the overheads are similar). When the cachingdynamic translator is interpreting the program code, it increments sucha counter each time a taken branch is interpreted. When a counterexceeds a preset threshold, its corresponding block is flagged as hot.These hot blocks can be strung together to create a hot trace. Such aprofiling technique has the following shortcomings:

[0010] 1. A large counter table is required, since the number ofdistinct blocks executed by a program can be very large.

[0011] 2. The overhead for trace selection is high. The reason can beintuitively explained: if a trace consists of N blocks, this scheme willhave to wait until N counters all exceed their thresholds before theycan be strung into a trace. It does not take advantage of the fact thatafter the first counter gets hot, the next N-1 counters are very likelyto get hot in quick succession, making it unnecessary to botherincrementing them and doing the bookkeeping of the past blocks that havejust executed.

SUMMARY OF THE INVENTION

[0012] According to the present invention, traces are identified as hoton a speculative basis, rather than based on full trace profile data.The series of blocks beginning at a hot start-of-trace condition andcontinuing until an end-of-trace condition is identified as a hot trace.Such a trace is identified as hot without the need to incur the overheadof actually measuring whether successive blocks have been executed asufficient number of times to be considered hot.

[0013] The identification of what constitutes the trace is accomplishedas the trace is executed. A translation of the trace is emitted as thetrace is being executed, is available for optimization in a system thatperforms optimization, and is captured in the code cache.

[0014] A particularly useful start-of-trace condition is when the lastinterpreted branch was backward taken. A useful end-of-trace conditionis when one of the following three conditions occurs: (1) the lastinterpreted branch was backward taken, (2) the number of interpretedbranches exceeded a threshold value, or (3) the number of nativeinstructions emitted for the trace exceeded another threshold value.

[0015] Thus, according to the present invention, rather than use higheroverhead, sophisticated profiling techniques for identifying program hotspots at runtime, profiling need only be done at certain well-definedprogram addresses, such as the targets of backward taken branches. Whensuch an address gets hot (i.e., its associated counter exceeds athreshold), the very next sequence of executed blocks (or trace) isspeculatively chosen as a hot path.

[0016] This scheme speculatively selects as a hot trace the very nextsequence of interpreted blocks following certain hot branch targets—inparticular, certain branch targets that are likely to be loop headers.Even though this scheme does not involve elaborate profiling, thequality of the traces selected by this technique can be excellent. Onecan understand why this scheme is effective as follows: sequences of hotblocks are very often correlated; entire paths tend to get hot in arunning program, rather than a disconnected set of blocks.

[0017] The present invention provides a mechanism for trace selectionwith reduced profiling overhead.

[0018] Another advantage of the present invention is that a trace can beconstructed even when it contains indirect branches (branches whoseoutcomes are known only when the branch is executed, and which cannot bedetermined by simply decoding the branch instruction). In contrast, itis awkward for trace growing schemes that rely on branch predictioninformation to deal with indirect branches, because there is no simpleway to predict the outcome of such branches.

[0019] A further advantage of the invention is that the memory requiredfor the storage of counters is smaller compared to traditional profilingschemes based on branch or basic block counting, because, with thepresent invention, it is not necessary to keeping track of counts foreach block or for each branch.

BRIEF DESCRIPTION OF THE DRAWING

[0020] The invention is pointed out with particularity in the appendedclaims. The above and other advantages of the invention may be betterunderstood by referring to the following detailed description inconjunction with the drawing, in which:

[0021]FIG. 1 illustrates the components of a dynamic translator such asone in which the present invention can be employed;

[0022]FIG. 2 illustrates the flow of operations in an implementation ofa dynamic translator employing the present invention; and

[0023]FIG. 3 shows program flow through four blocks of a program,illustrating that there can be a number of different traces startingwith a common block.

DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT

[0024] Referring to FIG. 1, a dynamic translator includes an interpreter110 that receives an input instruction stream 160. This “interpreter”represents the instruction evaluation engine; it can be implemented in anumber of ways (e.g., as a software fetch—decode—eval loop, ajust-in-time compiler, or even a hardware CPU).

[0025] In one implementation, the instructions of the input instructionstream 160 are in the same instruction set as that of the machine onwhich the translator is running (native-to-native translation). In thenative-to-native case, the primary advantage obtained by the translatorflows from the dynamic optimization 150 that the translator can perform.In another implementation, the input instructions are in a differentinstruction set than the native instructions.

[0026] The trace selector 120 identifies instruction traces to be storedin the code cache 130. The trace selector is the component responsiblefor associating counters with interpreted program addresses, determiningwhen to toggle the interpreter state (between normal and trace growingmode), and determining when a “hot trace” has been detected.

[0027] Much of the work of the dynamic translator occurs in aninterpreter—trace selector loop. After the interpreter 110 interprets ablock of instructions (i.e., until a branch), control is passed to thetrace selector 120 to make the observations of the program's behavior sothat it can select traces for special processing and placement in thecache. The interpreter—trace selector loop is executed until one of thefollowing conditions is met: (a) a cache hit occurs, in which casecontrol jumps into the code cache, or (b) a hot start-of-trace isreached.

[0028] When a hot start-of-trace is found, the trace selector 120toggles the state of the interpreter 110 so that the interpreter emitsthe trace instructions until the corresponding end-of-trace condition(condition (b)) is met. At this point the trace selector invokes thetrace optimizer 150. The trace optimizer is responsible for optimizingthe trace instructions for better performance on the underlyingprocessor. After optimization is done, the code generator 140 actuallyemits the trace code into the code cache 130 and returns to the traceselector 120 to resume the interpreter—trace selector loop.

[0029]FIG. 2 illustrates operation of an implementation of a dynamictranslator employing the present invention. The solid arrows representflow of control, while the dashed arrow represents generation of data.In this case, the generated “data” is actually executable sequences ofinstructions (traces) that are being stored in the translated code cache130.

[0030] The functioning of the interpreter 110, 210, 245 in the dynamictranslator of the illustrative embodiment has been extended so that ithas a new operating state (referred to below as “grow trace mode”): whenin that new state, the native code for a trace is emitted as a sideeffect of interpretation. For a native-to-native translation, thisprocess of emitting instructions simply amounts to passing on therelevant instructions from the input instruction stream 160. For othertranslations, the input instructions are translated into nativeinstructions, and those native instructions are recorded in a buffer.The translated native instructions are then executed and thenemitted—the buffer of translated instructions is made available forfurther processing (i.e., optimization 255 and placement into the cache260). Although a block is a useful unit to translate, execute, and emit,the interpreter may emit translated instructions in other units, and theinterpreter may perform the translate—execute loop on one size (such asinstruction or block) and pass translated instructions on for furtherprocessing in different units (such as a block or trace). Also, variousalternative implementations of an interpreter that emits translatedinstructions are possible.

[0031] The native code emitted by the interpreter 245 is stored in thetranslated code cache 130 for execution without the need forinterpretation the next time that portion of the program is executed(unless intervening factors have resulted in that code having beenflushed from the cache). In FIG. 2, the “normal mode” operation of theinterpreter 110 is shown at 210 and the “grow trace mode” operation ofthe interpreter is shown at 245.

[0032] The grow trace mode 245 of the interpreter 110 is exploited inthe present invention as a mechanism for identifying the extent of atrace; not only does grow trace mode generate data (instructions) to bestored in the cache, it plays a role in trace selection process itself.As described above, the present invention initiates trace selectionbased on limited profiling: certain addresses that meet start-of-traceconditions are monitored, without the need to maintain profile data forentire traces. A trace is selected based on a hot start-of-tracecondition. This selection is speculative, because the actual trace beingselected (which will be determined as the interpreter works its waythrough the trace in grow trace mode) may not be a frequently executed,even though it starts at a frequently executed starting address. At thetime a trace is identified as being hot (based on the execution counterexceeding a threshold), the extent of the instructions that make up thetrace is not known. The process of the interpreter emitting instructionsis what maps the extent of the trace; the trace grow mode is used tounravel the trace on the fly.

[0033] For example, referring to FIG. 3, four blocks of a program areshown to illustrate how identification of a trace starting point doesnot itself fully identify a trace. Block A meets the start-of-tracecondition (it is the target of a backward branch from D). With fourblocks having the branching relationship shown in FIG. 3, the followingtraces all share the same starting point (A): ABCD, ABD, ACD. The tracethat the program follows at the time that the counter for A becomes hotis the trace that is selected for storage in the cache in response tothat counter becoming hot—it could be any of those three traces(actually, there may be more than three possible traces, if the tracescontinue beyond D).

[0034] Referring to FIG. 2, the dynamic translator starts byinterpreting instructions until a taken branch is interpreted 210. Atthat point, a check is made to see if a trace that starts at the takenbranch target exists in the code cache 215. If there is such a trace(i.e., a cache ‘hit’), control is transferred 220 to the top of thatversion of the trace that is stored in the cache 130.

[0035] When, after executing instructions stored in the cache 130,control exits the cache via an exit branch, a counter associated withthe exit branch target is incremented 235 as part of the “trampoline”instruction sequence that is executed in order to hand control back tothe dynamic translator. When the trace is formed for storage in thecache 130, a set of trampoline instructions is included in the trace foreach exit branch in the trace. These instructions (also known astranslation “epilogue”) transfer control from the instructions in thecache back to the interpreter—trace selector loop. An exit branchcounter is associated with the trampoline corresponding to each exitbranch. Like the storage for the trampoline instructions for a cachedtrace, the storage for the trace exit counters is also allocatedautomatically when the native code for the trace is emitted into thetranslated code cache. In the illustrative embodiment, as a matter ofconvenience, the exit counters are stored with the trampolineinstructions; however, the counter could be stored elsewhere, such as inan array of counters.

[0036] Referring again to 215 in FIG. 2, if, when the cache is checkedfor a trace starting at the target of the taken branch, no such traceexists in the cache, then a determination is made as to whether a“start-of-trace” condition exists 230. In the illustrative embodiment,the start-of-trace condition is when the just interpreted branch was abackward taken branch. Alternatively, a system could employ differentstart-of-trace conditions that combined with or did not include backwardtaken branches: procedure call instructions, exits from the code cache,system call instructions, or machine instruction cache misses (if thehardware provided some means for tracking such things).

[0037] A backward taken branch is a useful start-of-trace conditionbecause it exploits the observation that the target of a backward takenbranch is very likely to be (though not necessarily) the start of aloop. Since most programs spend a significant amount of time in loops,loop headers are good candidates as possible hot spot entrances. Also,since there are usually far fewer loop headers in a program than takenbranch targets, the number of counters and the time taken in updatingthe counters is reduced significantly when one focuses on the targets ofbackward taken branches (which are likely to be loop headers), ratherthan on all branch targets.

[0038] If the start-of-trace condition is not met, then controlre-enters the basic interpreter state and interpretation continues. Inthis case, there is no need to maintain a counter; a counter incrementtakes place only if a start-of-trace condition is met. This is incontrast to conventional dynamic translator implementations that havemaintained counters for each branch target. In the illustrativeembodiment counters are only associated with the address of the backwardtaken branch targets and with targets of branches that exit thetranslated code cache; thus, the present invention permits a system touse less counter storage and to incur less counter increment overhead.

[0039] If the determination of whether a “start-of-trace” conditionexists 230 is that the start-of-trace condition is met, then, if acounter for the target does not exist, one is created or if a counterfor the target does exist, that counter is incremented.

[0040] If the counter value for the branch target does not exceed thehot threshold 240, then control re-enters the basic interpreter stateand interpretation continues 210.

[0041] If the counter value does exceed a hot threshold 240, then thisbranch target is the beginning of what will be deemed to be a hot trace.At this point, that counter value is not longer needed, and that countercan be recycled (alternatively, the counter storage could be reclaimedfor use for other purposes). This is an advantage over profiling schemesthat involve instrumenting the binary.

[0042] Because the profile data that is being collected by thestart-of-trace counters is consumed on the fly (as the program is beingexecuted), these counters can be recycled when its information is nolonger needed; in particular, once a start-of-trace counter has becomehot and has been used to select a trace for storage in the cache, thatcounter can be recycled. The illustrative embodiment includes a fixedsize table of start-of-trace counters. The table is associative—eachcounter can be accessed by means of the start-of-trace address for whichthe counter is counting. When a counter for a particular start-of-traceis to be recycled, that entry in the table is added to a free list, orotherwise marked as free.

[0043] The lower the threshold, the less time is spent in theinterpreter, and the greater the number of start-of-traces thatpotentially get hot. This results in a greater number of traces beinggenerated into the code cache (and the more speculative the choice ofhot traces), which in turn can increase the pressure on the code cacheresources, and hence the overhead of managing the code cache. On theother hand, the higher the threshold, the greater the interpretiveoverhead (e.g., allocating and incrementing counters associated withstart-of-traces). Thus the choice of threshold has to balance these twoforces. It also depends on the actual interpretive and code cachemanagement overheads in the particular implementation. In our specificimplementation, where the interpreter was written as a softwarefetch-decode-eval loop in C, a threshold of 50 was chosen as the bestcompromise.

[0044] If the counter value does exceed a hot threshold 240, then, asindicated above, the address corresponding to that counter will bedeemed to be the start of a hot trace. At the time the trace isidentified as hot, the extent of the trace remains to be determined (bythe end-of-trace condition described below). Also, note that theselection of the trace as ‘hot’ is speculative, in that only the initialblock of the trace has actually been measured to be hot.

[0045] At this point, the interpreter transitions from normal mode 210to grow trace mode 245. In this mode, as described above, interpretationcontinues, except that as instructions are interpreted, the nativetranslation of the instructions is also emitted so that they can bestored in the code cache 130. The interpreter stores the nativeinstructions into a buffer. When an end-of-trace condition is reached250, the buffer with the complete trace is handed to an optimizer 255.After optimization, the optimized native instructions are placed intothe cache, and the counter storage associated with the trace's startingaddress is recycled 260. (Alternatively, the counter storage could berecycled as early as when the counter has been determined to exceed thehot threshold.) Also, triggered by the end-of-trace condition, theinterpreter 110 transitions back to the normal interpreter state.

[0046] An end-of-trace condition is simply a heuristic that says when tostop growing the current trace. The following are some examples of somepossible end-of-trace conditions: ending a trace when a backward takenbranch is reached avoids unfolding cycles unnecessarily and alsocaptures loops; a “return” branch can be a useful end-of-trace becauseit can indicate the end of a procedure body; generally, it is desireableto trigger and end-of-trace if a new start-of-trace has occurred.

[0047] In the illustrative embodiment, the end-of-trace condition is metwhen (a) a backward taken branch is interpreted, or (b) when a certainnumber of branch instructions has been interpreted (in the illustrativeembodiment this number is 20) since entering the grow trace mode(capping the number of branches on the trace limits the number of placesthat control can exit the trace—the greater the number of branches thatcan exit the trace, the less the likelihood that the entire trace isgoing to be utilized and the greater the likelihood of an early traceexit), or (c) a certain number of native translated instructions hasbeen emitted into the code cache for the current trace. The limit on thenumber of instructions in a trace is chose to avoid excessively longtraces. In the illustrative embodiment, this is 1024 instructions, whichallows a conditional branch on the trace to reach its extremities (thisfollows from the number of displacement bits in the conditional branchinstruction on the PA-RISC processor, on which the illustrativeembodiment is implemented).

[0048] Although the cache can be sized large enough so that replacementof entries is not required, typically a replacement scheme will be used.One approach is to flush the cache when it is full and space for a newentry is needed. However, another approach that offers advantages is toflush the cache preemptively, based on some indication that theprogram's working set is changing. Such a preemptive approach isdescribed in the co-owned application titled “A Preemptive ReplacementStrategy For A Caching Dynamic Translator,” Sanjeev Banerjia, VasanthBala, and Evelyn Duesterwald, filed the same date as the presentapplication.

[0049] When a trace is removed from the code cache, the memory used forcounter storage for each of the trace's exit branches is automaticallyrecovered. Thus, the storage for these exit branch target counters is“free” in that sense, because they do not have to be independentlyallocated and managed like the other counters associated withinterpreted branch targets (those targets that have met start-of-traceconditions, but for which the associated counter has not yet exceededthe “hot” threshold); as discussed above, the exit branch targetcounters are allocated as a part of creating the trampoline for the exitbranch.

[0050] In the illustrative embodiment, FIGS. 1 and 2 are related asfollows; one skilled in the art will appreciate that these functions canbe organized in other ways in other implementations. The interpreter 210implements 210 and 245. The code generator 140 implements 260. The traceoptimizer 150 implements 255. The trace selector 120 implements 215,220, 230, 235, 240, and 250.

[0051] The illustrative embodiment of the present invention isimplemented as software running on a general purpose computer, and thepresent invention is particularly suited to software implementation.Special purpose hardware can also be useful in connection with theinvention (for example, a hardware ‘interpreter’, hardware thatfacilitates collection of profiling data, or cache hardware).

[0052] The foregoing has described a specific embodiment of theinvention. Additional variations will be apparent to those skilled inthe art. For example, although the invention has been described in thecontext of a dynamic translator, it can also be used in other systemsthat employ interpreters or just-in-time compilers (JITs). Further, theinvention could be employed in other systems that emulate any non-nativesystem, such as a simulator. Thus, the invention is not limited to thespecific details and illustrative example shown and described in thisspecification. Rather, it is the object of the appended claims to coverall such variations and modifications as come within the true spirit andscope of the invention.

We claim:
 1. In a dynamic translator, a method for selecting hot traces in a program being translated comprising the steps of: (A) dynamically associating counters with addresses in the program being translated that are determined, during program translation and execution, to meet a start-of-trace condition; (B) when an instruction with a corresponding counter is executed, incrementing that counter, and (C) when a counter exceeds a threshold, determining the particular trace (of the possible plurality of traces beginning at that address) that begins at the address corresponding to that counter and is defined by the path of execution taken by the program following that particular execution of that instruction and continuing until an end-of-trace condition is met and identifying that trace as a hot trace.
 2. The method of claim 1 in which the dynamic translator includes an interpreter that can be switched between a normal mode and a grow trace mode in which the interpreter emits native instructions as a side-effect of interpretation, the method further comprising the steps of: (D) when a counter exceeds a threshold, switching the interpreter to grow trace mode; (E) when the interpreter is in grow trace mode and an end-of-trace condition is met, switching the interpreter to normal mode; (F) using the instructions emitted by the interpreter to determine the trace that is to be identified as a hot trace.
 3. The method of claim 1 in which, in response to identifying a trace as a hot trace, the corresponding counter is recycled.
 4. The method of claim 1 in which the start-of-trace condition is when the last interpreted branch was backward taken.
 5. The method of claim 2 in which the start-of-trace condition is when the last interpreted branch was backward taken.
 6. In a dynamic translator comprising: (A) counters for storing counts of the number of times that instructions are executed at addresses associated with the counters; (B) means for identifying addresses that meet a start-of-trace condition and for associating such addresses with counters; (C) means for determining when a counter exceeds a threshold; and (D) trace identification means for identifying a series of instructions executed following the instruction at the address associated with the counter as a hot trace.
 7. The dynamic translator of claim 6 further comprising an interpreter that can be switched between a normal mode and a grow trace mode in which the interpreter emits native instructions as a side-effect of interpretation, and in which the interpreter is switched to the grow trace mode in response to a counter exceeding a threshold, and in which the trace identification means uses the emitted instructions in its identification of a hot trace.
 8. The translator of claim 6 in which, in response to identifying a trace as a hot trace, the corresponding counter is recycled.
 9. The translator of claim 6 in which the start-of-trace condition is when the last interpreted branch was backward taken.
 10. The translator of claim 7 in which the start-of-trace condition is when the last interpreted branch was backward taken.
 11. In a dynamic translator, a method for selecting hot traces comprising the steps of: (A) maintaining counts for addresses that meet a start-of-trace condition; (B) detecting when one of these counters exceeds a threshold; (C) in response to a counter exceeding a threshold, identifying as a hot trace the instructions beginning with the address with which that counter was associated and continuing until reaching an end-of-trace condition.
 12. In a dynamic translator having a cache, a method for selecting hot traces comprising the steps of: (A) when a backward branch is taken, if a counter does not exist for the branch target, then creating such a counter, if such a counter does exist, then incrementing the counter; (B) if a counter exceeds a threshold, then storing in the cache a translation of those instructions executed starting at the branch target associated with the counter that exceeded the threshold and continuing until an end-of-trace condition is reached.
 13. The translator of claim 12 in which, when a translation is stored in the cache, the corresponding counter is recycled. 